Statistical causes for the Epps effect in microstructure noise 
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We present two statistical causes for the distortion of correlations on high-frequency financial 
data. We demonstrate that the asynchrony of trades as well as the decimalization of stock prices 
has a large impact on the decline of the correlation coefficients towards smaller return intervals 
(Epps effect). These distortions depend on the properties of the time series and are of purely 
statistical origin. We are able to present parameter-free compensation methods, which we validate 
in a model setup. Furthermore, the compensation methods are applied to high-frequency empirical 
data from the NYSE's TAQ database. A major fraction of the Epps effect can be compensated. The 
contribution of the presented causes is particularly high for stocks that are traded at low prices. 
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I. INTRODUCTION 

The decline of calculated correlations in financial data 
towards smaller return intervals was first discovered by 
Thomas Epps in 1979 pQ. This behavior was subse- 
quently detected on different stock exchanges [2HI] and 
foreign exchange markets [6] . The Epps effect has re- 
ceived considerable attention, from economists as well as 
from mathematicians and theoretical physicists. 

Hayashi and Yoshida [7] introduced a cumulative esti- 
mator that only considers returns with overlapping time 
intervals. Hence, it deals with the asynchrony of time 
series as a cause for the Epps effect. Subsequently, Voev 
and Lunde [5] demonstrated that this estimator can be 
biased in the presence of noise and proposed a bias cor- 
rection. Griffin and Oomen [5j extended the estimator 
of Hayashi and Yoshida by adjustments for lagged cor- 
relations. The work of Toth and Kertesz [JU] also deals 
with the phenomenon of lagged correlations. They in- 
troduce a model that is based on the decomposition of 
cross-correlations. The recent study of Zhang [IT] , shows 
that usual previous-tick-estimators are biased. They con- 
sequently provide an optimal sampling frequency of re- 
turns to suppress the Epps effect. Barndorff-Nielson et. 
al. [TU [T3] examine high frequency correlations and pro- 
pose multivariate realized kernels to significantly improve 
the estimation of correlations. An extensive study of mi- 
croscopic causes leading to the Epps effect has been per- 
formed by Reno |14) . 

Clearly, many mechanisms contribute to the Epps ef- 
fect. We demonstrate that there are two major causes 
of purely statistical origin. Our aim is not to develop 
a complete description of the Epps effect. We rather 
want to identify statistical causes that can be compen- 
sated directly, without the requirement of adjusting pa- 
rameters, model calibrations or an optimal sampling fre- 
quency. The two major causes we identify arc the asyn- 
chrony of the time series and the impact of the decimal- 
ization by the tick-size. 

This paper is organized as follows. In section [II A| and 



asynchronous time series and the impact of the tick-size. 
This is followed by a combined compensation of both ef- 
fects in section Hi CI The results are validated in a model 
setup in section [III A| In section [Til B[ we apply the com- 
pensation methods to empirical data from the NYSE's 
TAQ database to estimate the impact of the statistical 
causes on the Epps effect. We discuss our results in sec- 
tion Evl 



II B we present compensation methods for the impact of 



* michael@muennix.com 



II. THEORY OF COMPENSATING 
MICROSTRUCTURE NOISE DISTORTIONS 

In the sequel, we give an overview over compensation 
methods that account for distortions of the Pearson cor- 
relation coefficient due to statistical effects. In particu- 
lar, the asynchrony of trading times and the impact of 
the tick-size are considered. 



A. Asynchrony of trading times 

We begin with demonstrating how the asynchrony of 
time series contributes to the Epps effect. By asynchrony 
we refer to time series that feature an arbitrary lag for 
a given point in time but the average lag is zero. The 
asynchrony is simply due to the non- synchronous pricing 
of stocks. A detailed derivation and study of our finding 
is performed in [15] 

Toth and Kertesz [10] stated that the impact of the 
asynchrony is weak, compared to the impact of a static 
lag, for which they developed a model. In the following, 
we will demonstrate that the asynchrony can be a major 
cause for the Epps effect. 

The central assumption of this approach is the exis- 
tence of underlying non-lagged time series of prices. The 
assumption of a finer (see, e.g., (TUj) or continuous (see, 
e -g-j ( [II HUE]) timescale is frequently used in the es- 
timation of correlations. This ansatz is also intuitive, as 
most stocks are traded at several stock exchanges and 
off-exchange (OTC) simultaneously. The basic idea is 
the following: Due to the asynchrony, each term of the 
Pearson correlation coefficient can be divided into a part 
which contributes to the correlation and a part which is 
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traded on low quantities. This overlap is given by 



t' + At t 



FIG. 1. Illustration of the model for asynchronous trading 
times of two stocks. Shown on the top are the prices S for the 
hypothetical underlying timescale. The "sampling" of the- 
ses prices to macroscopic prices S with randomly distributed 
points of trades is illustrated in the middle. The points of 
trades are indicated by the vertical lines. The thick bars at 
the bottom illustrate the return interval between t' and t'+At. 
The points of last trades on these times are denoted with 7. 
The shaded area indicates the overlap At . 
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The fractional overlap At /At declines with lower re- 
turn intervals as shown in Fig. [2] This scaling behavior 
already looks similar to the Epps effect on correlation co- 
efficients. As we will demonstrate, the fractional overlap 
is strongly connected to the Epps effect. 

In the sequel, we consider the Epps effect on relative 
price changes or arithmetic financial returns r which are 
defined as the relative price change during a return in- 
terval At. It reads 



r(t) 



S(t + At) - S(t) 

W) 



(2) 



where S(t) refers to the price of a security at time t. 

Regarding the hypothetical underlying time series, the 
information within the overlap At is synchronous. This 
part gives the true correlation of the time series. In con- 
trast, the parts left and right from the overlap are asyn- 
chronous. Under the assumption of randomly distributed 
trading times, these parts are on average uncorrelated. 
Hence, the returns outside the overlap distort the corre- 
lation coefficient. 

It follows that the contribution of these two single re- 
turns to the Pearson correlation coefficient is the Pearson 
correlation coefficient of the underlying time series mul- 
tiplied by the fractional overlap as shown in (15) . This 
can easily be outlined, when considering the normalized 
returns of the underlying time series 



m = 



f(t) - (?) 



(3) 



uncorrelated and therefore lowers the correlation coeffi- 
cient. 

The situation is sketched in Fig. [I] Here, Ji(t) refers 
to the point of last trade, for the i-th stock at time 
t. The waiting times, that is, the periods between two 
consecutive trades, are randomly (usually exponentially) 
distributed. Hence, when calculating the return of the 
interval from t! to t' + At, we actually obtain the re- 
turn on an effective return interval which is between the 
points of last trade referring to the right and the left 
side of the return interval, that are [7i(t')i 7i(i' + At] 
and [72(f); 72 (t' + At]. These intervals can be smaller 
or larger than the initially chosen return interval. When 
considering the returns of two stocks within the same 
interval, one obtains two effective return intervals that 
are in most cases not equal in length, start-point and 
end-point. These intervals usually share an overlap At Q , 
although this is not necessarily true for stocks that are 
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FIG. 2. Average fractional overlap for the 5 highest correlated 
stock pairs of each industry branch in the S&P 500 index 
versus the return interval At. 



Here, (• • • ) denotes the average over T and a denotes to 
the standard deviation of the time series with length T. 
The tilde indicates the underlying time series. When 
calculating the Pearson correlation coefficient of two 
time series, the overlap is a function of the time step: 
At Q = At Q (t). We denote the interval of the overlap 
At (t) for each time step as Jit). The time steps on 
the underlying timescale are denoted with t. We can re- 
arrange the terms of the correlation coefficient in terms 
that originate from within the overlap-interval and thus 
are synchronous, and terms that are asynchronous, 
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This leads to 



corr(ri,r 2 ) = - 2^ corr t(ffi>S2r 



At 



(5) 



where, corr t (51,(72) is the Pearson correlation coefficient 
of the underlying time series for the interval [t,t + At]. 
It gives the true correlation. Each term of the corre- 
lation coefficient is multiplied by the fractional overlap 
At / At a (t) , because only the information inside the over- 
lap contributes to the correlation coefficient. 

As we are able to quantify the impact of the fractional 
overlap on the correlation coefficient, we can easily com- 
pensate this distortion by 



corr async (ri,r 2 ) = (gi(tj)g 2 (tj) ^ 



(6) 



where g refers to the normalized return of the correspond- 
ing (non-hypothetical) time series. Furthermore, only re- 
turns should be considered that actually share an overlap, 
analogously to the estimator of Hayashi and Yoshida [7] . 

Initially, we made the assumption of an underlying 
time series of prices, which is correlated and which ex- 
ists on a smaller time scale. Equation (|6| does no longer 
depend on the time scale of the hypothetical underlying 
time series. Neither does it depend on the actual prices 
on the underlying time series. Hence, the only necessary 
assumption is that there exists underlying information 
which is correlated on a finer time scale. This is an im- 
portant finding, since Martens and Poon [16] indicated 
that the synchronization of returns from international 
stock exchanges is a non-trivial problem. 
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FIG. 3. Detail of the distribution of 5-min returns of the AES 
Corp. share in 2007. The shaded areas refer to returns that 
originate from the same absolute price change AS. The price 
changes AS are denoted as multiplies of the tick-size q. 



B. Tick-Size 

We now turn to the second statistical cause. We esti- 
mate the tick-size's impact on the Epps effect. A com- 
prehensive derivation and discussion is performed in [T7] . 

The lowest possible price change, the tick-size, of most 
securities has been constantly reduced, resulting in tick- 
sizes of i/ioo-th of the respective currency on most stock 
exchanges. This process is often referred to as decimal- 
ization. It was, e.g., motivated by aiming at an en- 
hanced market efficiency. In theory, small tick-sizes allow 
for a faster clearing of market arbitrage. However, the 
question whether a smaller tick-size generally improves 
the market quality is controversially discussed [TBI US] • 
Among others, Harris [20] discussed that a larger tick- 
size can ensure liquidity, but on the other hand, it can 
lead to erroneous data in financial indices [UJ. In this 
context, Angel [22\ observed that the prices of a stock are 
commonly in a typical range, which is optimal to provide 
liquidity. Companies perform stock splits to control the 
absolute price of their share. A recent study by Onnela 
et. al. [23 indicates that in some cases only a fraction of 
the theoretically possible prices are used. Hence, prices 
cluster at certain multiples of the tick-size resulting in an 
effective tick-size. 

At first glance, it could appear that the transition from 
absolute price changes AS to returns r removes this dis- 
cretization from the distribution, since the returns are 
almost continuously distributed. A detailed look at the 
center of a return distribution (see Fig. [3]) reveals that 
the discretization effects are still visible. Despite its poor 
graphical visibility, the discretization affects returns on 
all intervals. Especially, we expect an impact on the cor- 
relation coefficient if the discretization is high, that is, 
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when stocks are traded at low prices. 

Due to the imposing of discrete prices, information is 
lost. The average relative price change becomes smaller 
when considering smaller return intervals. The tick-size 
remains the same of course. The information loss grows 
for smaller return intervals. Thus, the discretization 
should also contribute to the Epps effect. 

The basic assumption of our model is that we can sta- 
tistically describe the discreteness in market prices by 
a discretization of a hypothetical underlying price. Of 
course, the market prices do not actually result from a 
discretization process. However, there is a large variety of 
trading strategies simultaneously acting on the market. 
These strategies also act on a large spectrum of different 
investment horizons. There are even traders that try to 
exploit the finite tick-site in their trading strategies. As 
the price formation results from the interaction of this 
diversity of trading strategies, the price fluctuations on 
the level of the tick-size can be viewed as purely statis- 
tical. Hence, a natural approach is the assumption of on 
average uniformly distributed discretization errors. 

Using the arithmetic return defined in equation ([2]), we 
introduce the discretization error as 



f(t) 



■(*) = 



AS(t) 
AS{t) + 



S(t) 



with 



AS(t) = S(t) - S{t + At) 



(7) 
(8) 

(9) 

where S(t) denotes to the discretized stock price and f(t) 
denotes the return, which is based on discretized stock 
prices. We emphasize that we do not account for the 
discretization of the prices S(t), but consider only the 
discretization of the price changes AS(t). We demon- 
strate in section |III A| that this only induces a negligible 
error. As in equation |8) is actually the difference of 
two uniformly distributed discretization errors, it follows 
a triangular distribution. 

The calculation of the Pearson correlation coefficient 
including the discretization errors as introduced in ^ 
leads to 

cov(ri,r 2 ) 



corrtick (r 1,^2) = 



cov (ri,r 2 ) 
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Here, f refers to the return with respect to discretized 
prices. Only the terms cov(fi,f2), var(fi) and var(f2) 
can be calculated with the discretized prices. All other 
terms are unknown and describe the information loss 
due to the discretization. We can estimate these terms 
and thereby compensate for the information loss by in- 
terpolating the price change distribution. Technically, 
this is achieved by expanding the variance and covari- 
ance terms in equation and ( 12 ) and estimating the 



discretization error for all price changes individually. Es- 
timation techniques for the individual discretization er- 
rors are comprehensively discussed in [17] . This study 
indicates that only certain terms of equation (11) have 
a noticeable impact on the compensation. If calculation 
speed is an issue, one can approximate 



cofrtick(r-i,r 2 ) 



cov(ri,r 2 ) 



(13) 



.(12) 



The main contribution to the distortion of correlation co- 
efficients in small return intervals is the overestimation of 
a. Fig. [7] in section III A shows this overestimated a and 
the tick-size-corrected a versus the return interval At. 
This is consistent with the findings of Hansen and Lundc 
[24] . They demonstrate that the realized variance is over- 
estimated on small return intervals due to microstructure 
noise. The empirical evidence in section |IIIB| indicates 
that the tick-size have profound impact on this noise. 

Due to the convex shape of the price change distribu- 
tion, the discretization errors are not distributed sym- 
metrically. This effect grows with the impact of the dis- 
cretization, i.e., smaller return intervals. Thus, the es- 
timation of variances on the discretized values is biased. 
This gives the largest contribution to the distortion of 
correlation coefficients due to discretized data. We can 
correct this behavior with the presented compensation. 



C. Combined compensation 

Having presented compensation methods for distor- 
tions of the correlation coefficient due to asynchronous 
time series and due to the tick-size, we now combine both 
findings. The compensation of asynchrony acts on each 
term of the Pearson correlation coefficient for every point 
in time. The tick-size compensation, in contrast, acts on 
the Pearson correlation coefficient as a whole in terms 
of the time series, but it acts on every occurring price 
change individually. Both effects superimpose, as illus- 
trated in Fig. |4j The horizontal axis shows the prod- 
uct of normalized 1-min returns for each point in 2007 
(overnight returns are excluded) . The vertical axis shows 
the corresponding fractional overlap of each return pair. 
The discretization effects are visible in the center, super- 
imposed with the asynchronous characteristics. Similar 
to the findings of Szpiro [25] for single stocks, the tick- 
size induces a nanostructure on the terms of the Person 
correlation coefficient for two stocks. 
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FIG. 4. Product of normalized return pairs versus fractional pIG g Model rf compensation methods . 

their fractional overlap for 1-min returns of the shares of Nov- 
ell Inc. and Unisys Corp. in 2007. The average fractional 
overlap is 0.76. 

A. Model results 



The simultaneous compensation of both effects can be 
achieved by combining both presented compensations. It 
reads 



— f \ /-- At 
corr(ri,r 2 ) = < r i r 2^- 



(14) 
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(15) 



Analogously to the previous section, this expression can 
be approximated by 



cori(ri,r 2 ) « — - — 



(16) 



By multiplying the covariance terms of discretized re- 
turns f with the inverse fractional At/ At overlap and 
by correcting the overestimation of the standard devia- 
tions a, the largest fraction of the correlation coefficient's 
distortion can be compensated. 



III. RESULTS 

Before applying the method to empirical data, we 
study it in a model setup. Subsequently, we apply 
the compensation methods to empirical data from the 
NYSE's TAQ database to estimate the impact of the pre- 
sented causes on the distortion of correlation coefficients. 



We start by generating an underlying correlated time 
series using a G ARCH (1,1) model, as introduced in 



ri(t)=ai{t) (V^v(t) + Vl 



ce 



(*)) 



(17) 



Here Ti (t) stands for the return of the i-th stock at time t 
and c is the correlation coefficient. The random variables 
77(f) and £i(t) are taken from standard normal distribu- 
tions. 77(f) is identical for all stocks; It induces the corre- 
lation. The Ei are individual for each stock. <Ji{t) is the 
non-constant variance, given by a GARCH(1,1) process 

o%(t) = a Q + ai rj{t - 1) + p l0 f(t - 1) . (18) 

The initial parameters of the G ARCH (1,1) process are 
chosen as ao = 2.4 x 10~ 4 , ai — 0.15 and /?i = 0.84. 

Two return time series r\ and r 2 are generated rep- 
resenting two correlated stocks. The total lengths of 
these time series is chosen as 7.2 x 10 6 , corresponding 
to a return interval At of one second during one trading 
year. From these returns, we generate two underlying 
price time series Si and S2 . We set the starting prices to 
t = to 1000. c is chosen as 0.4. 

To model the asynchronous trade processes, these 
prices are sampled independently using exponentially dis- 
tributed waiting times with average values typical for the 
stock market. We choose the average waiting times as 15 
and 25 data points (equivalent to seconds in this setup). 
In the next step, we round the prices to integer values. 
An integer price of, for example, 1000 then corresponds 
to a price of 10 and a tick-size of 0.01. 

Finally, we construct the time series of returns from 
these prices using return intervals from 60 data points 
(corresponding to 1 minute) to 1800 data points (corre- 
sponding to 30 minutes). The thus obtained time series 
features both, asynchrony and discretization. 
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The results of the applied compensation methods are 
shown in Fig[5} We are able to compensate the statistical 
distortion of correlation coefficients almost completely. 
The remaining decline of the corrected correlation coeffi- 
cient on very small return intervals is due to the approx- 
imations presented in sections II B and II C (only price 
change discretization is considered) as well as the negli- 
gence of the correlation between price changes and prices 
and the discretization of prices. The impact of the over- 
estimation of the standard deviation a is shown in Fig. 
[7J This illustrates that the tick-size can have a large im- 
pact on the overestimation of a. Moreover, we are able 
to compensate for this behavior down to approximately 
At = 180 time steps (corresponding to 3 minutes in our 
model). 



B. Empirical evidence 

As already mentioned, many mechanisms contribute to 
the Epps effect. Our present aim is to quantify the part, 
which is caused by the statistical properties of the time 
series. 

It is difficult to isolate the Epps effect on single stock 
pairs, as it is superimposed with other effects leading to 
other characteristics of the correlation coefficient than 
expected for the Epps effect. 

Because of that, we classify two ensembles of stock 
pairs. After compensating the asynchrony effect for each 
pair, we build the average for the ensemble by normaliz- 
ing the correlation coefficients individually by their sat- 
uration value at a return interval of 30 minutes. We also 
plot the error bars of the compensation representing the 
double standard deviation 2er of the correction. By this 
method, we can show the scope of the asynchrony model 
and identify regions, in which other effects dominate. All 
data is extracted from the NYSE's TAQ database for the 
year 2007. 

The first ensemble consists of 5 stock pairs of each in- 
dustry sector of the S&P 500 index (50 stocks in total), 
whose daily returns provide the strongest correlation dur- 
ing the year 2007. We applied the asynchrony compensa- 
tion to this ensemble. The results shown in Fig 6(a) indi- 



cate that the asynchrony has a pronounced impact on the 
Epps effect. It appears that asynchrony effects are the 
dominating cause for the Epps effect on return intervals 
down to approximately 10 minutes, where the remaining 
Epps effect is on average less than 3% of the correlation 
coefficient's saturation value at large return intervals. Of 
course, within the statistical ensemble stock pairs can be 
found which either do not show an Epps effect or which 
are so infrequently traded that the assumption of an un- 
derlying timeline might be unreasonable. Even though 
the assumption of an underlying time series is a common 
and intuitive approach, it may not be valid for stocks 
traded on very low frequencies. 

The second ensemble serves as a test scenario for the 
tick-size compensation. We expect the tick-size to only 
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(a)Asynchrony compensation 
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(b)Tick-size compensation 
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(c)Combined compensation 

FIG. 6. Empirical results of applied compensation methods, 
a) represents the average over the 50 highest correlated stock 
pairs in the S&P 500 index in 2007 (Top 5 from each industry 
branch), b) and c) represent the average over the 25 highest 
correlated stock pairs that were traded between $10.01 and 
$20.00 in 2007. The correlation coefficients have been indi- 
vidually normalized to the corrected value at At — 30 min 
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FIG. 7. Overestimated standard deviation a and tick-size- 
corrected standard deviation a versus the return interval At 
within the model. 



have a large impact on the correlation coefficient, if the 
discretization is high, i.e., if stocks are traded at low 
prices. Thus, we construct the second ensemble out of 
the 25 most strongly correlated stocks in the S&P 500 
that are traded in the range of $10.01 to $20.00. The 
price change distributions are segment-wise interpolated 
with heavy tailed distributions as, i.e., suggested by [2"7j . 
The results in Fig. |6(b)| indicate that for stocks that 
are traded at low prices, the tick-size can have a sizable 
impact on the Epps effect. 

Eventually, we apply a combined compensation to the 
second ensemble. Results are shown in Fig. 



6(c) The 



are traded at low prices. The asynchrony of time series 
as well as the tick-size have a major impact on the Epps 
effect. We developed two simple methods to compensate 
for these causes. 

However, this is not a full description of the Epps effect 
as there are certainly many phenomena contributing to 
it. In certain scenarios, other statistical properties of 
the time series or other causes for the Epps effect might 
dominate. The size of the error bars in Fig. [6] indicates 
that the asynchrony compensation does not give reliable 
results for return intervals below 3 minutes. Especially 
for very small return intervals, a lag between the time 
series of two stocks might be the dominating cause, as 
suggested by Toth and Kertesz [TO] . 

For stocks that are infrequently traded at very low 
prices (often referred to as penny- stocks) the assump- 
tion of uniformly distributed discretization errors needs 
to be carefully reflected. It is possible that certain trad- 
ing strategies dominate for those stocks leading to an 
asymmetrical distribution of discretization errors. 

Nonetheless, the presented compensations significantly 
improve the estimation of financial correlations. These 
methods do not require parameter adjustments or model 
calibrations. Our empirical study indicates that the iden- 
tified causes can contribute up to 75% of the Epss effect 
for stocks that are traded at low prices. 



empirical evidence indicates that statistical effects can 
have a very profound impact on the Epps effect. 
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